Weakly-supervised text-to-speech alignment confidence measure

نویسندگان

  • Guillaume Serrière
  • Christophe Cerisara
  • Dominique Fohr
  • Odile Mella
چکیده

This work proposes a new confidence measure for evaluating text-to-speech alignment systems outputs, which is a key component for many applications, such as semi-automatic corpus anonymization, lips syncing, film dubbing, corpus preparation for speech synthesis and speech recognition acoustic models training. This confidence measure exploits deep neural networks that are trained on large corpora without direct supervision. It is evaluated on an open-source spontaneous speech corpus and outperforms a confidence score derived from a state-of-the-art text-to-speech aligner. We further show that this confidence measure can be used to fine-tune the output of this aligner and improve the quality of the resulting alignment.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Unsupervised and Lightly Supervised Part-of-Speech Tagging Using Recurrent Neural Networks

In this paper, we propose a novel approach to induce automatically a Part-Of-Speech (POS) tagger for resource-poor languages (languages that have no labeled training data). This approach is based on cross-language projection of linguistic annotations from parallel corpora without the use of word alignment information. Our approach does not assume any knowledge about foreign languages, making it...

متن کامل

Bitext Name Tagging for Cross-lingual Entity Annotation Projection

Annotation projection is a practical method to deal with the low resource problem in incident languages (IL) processing. Previous methods on annotation projection mainly relied on word alignment results without any training process, which led to noise propagation caused by word alignment errors. In this paper, we focus on the named entity recognition (NER) task and propose a weakly-supervised f...

متن کامل

Confidence Measures in Speech Emotion Recognition Based on Semi-supervised Learning

Even though the accuracy of predictions made by speech emotion recognition (SER) systems is increasing in precision, little is known about the confidence of the predictions. To shed some light on this, we propose a confidence measure for SER systems based on semi-supervised learning. During the semi-supervised learning procedure, five frequently used databases with manually created confidence l...

متن کامل

Semi-supervised speaker adaptation

We developed powerful unsupervised adaptation methods for speech recognition, i.e., the system improves its performance while the user uses it. No prior enrollment phase is necessary where the speaker has to read a given text. We tried to further improve the unsupervised adaptation by using confidence measures. These give an estimate of how likely the recognized words were correct. Adaptation t...

متن کامل

Speech Analysis in the Big Data Era

In spoken language analysis tasks, one is often faced with comparably small available corpora of only one up to a few hours of speech material mostly annotated with a single phenomenon such as a particular speaker state at a time. In stark contrast to this, engines such as for the recognition of speakers’ emotions, sentiment, personality, or pathologies, are often expected to run independent of...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016